231 research outputs found
Global Optimization for Value Function Approximation
Existing value function approximation methods have been successfully used in
many applications, but they often lack useful a priori error bounds. We propose
a new approximate bilinear programming formulation of value function
approximation, which employs global optimization. The formulation provides
strong a priori guarantees on both robust and expected policy loss by
minimizing specific norms of the Bellman residual. Solving a bilinear program
optimally is NP-hard, but this is unavoidable because the Bellman-residual
minimization itself is NP-hard. We describe and analyze both optimal and
approximate algorithms for solving bilinear programs. The analysis shows that
this algorithm offers a convergent generalization of approximate policy
iteration. We also briefly analyze the behavior of bilinear programming
algorithms under incomplete samples. Finally, we demonstrate that the proposed
approach can consistently minimize the Bellman residual on simple benchmark
problems
Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs
Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in
solving decentralized POMDPs with large horizons. We generalize the algorithm
and improve its scalability by reducing the complexity with respect to the
number of observations from exponential to polynomial. We derive error bounds
on solution quality with respect to this new approximation and analyze the
convergence behavior. To evaluate the effectiveness of the improvements, we
introduce a new, larger benchmark problem. Experimental results show that
despite the high complexity of decentralized POMDPs, scalable solution
techniques such as MBDP perform surprisingly well.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty
in Artificial Intelligence (UAI2007
MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs
We present multi-agent A* (MAA*), the first complete and optimal heuristic
search algorithm for solving decentralized partially-observable Markov decision
problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for
computing optimal plans for a cooperative group of agents that operate in a
stochastic environment such as multirobot coordination, network traffic
control, `or distributed resource allocation. Solving such problems efiectively
is a major challenge in the area of planning under uncertainty. Our solution is
based on a synthesis of classical heuristic search and decentralized control
theory. Experimental results show that MAA* has significant advantages. We
introduce an anytime variant of MAA* and conclude with a discussion of
promising extensions such as an approach to solving infinite horizon problems.Comment: Appears in Proceedings of the Twenty-First Conference on Uncertainty
in Artificial Intelligence (UAI2005
Real-time robot deliberation by compilation and monitoring of anytime algorithms
Anytime algorithms are algorithms whose quality of results improves gradually as computation time increases. Certainty, accuracy, and specificity are metrics useful in anytime algorighm construction. It is widely accepted that a successful robotic system must trade off between decision quality and the computational resources used to produce it. Anytime algorithms were designed to offer such a trade off. A model of compilation and monitoring mechanisms needed to build robots that can efficiently control their deliberation time is presented. This approach simplifies the design and implementation of complex intelligent robots, mechanizes the composition and monitoring processes, and provides independent real time robotic systems that automatically adjust resource allocation to yield optimum performance
Optimizing Memory-Bounded Controllers for Decentralized POMDPs
We present a memory-bounded optimization approach for solving
infinite-horizon decentralized POMDPs. Policies for each agent are represented
by stochastic finite state controllers. We formulate the problem of optimizing
these policies as a nonlinear program, leveraging powerful existing nonlinear
optimization techniques for solving the problem. While existing solvers only
guarantee locally optimal solutions, we show that our formulation produces
higher quality controllers than the state-of-the-art approach. We also
incorporate a shared source of randomness in the form of a correlation device
to further increase solution quality with only a limited increase in space and
time. Our experimental results show that nonlinear optimization can be used to
provide high quality, concise solutions to decentralized decision problems
under uncertainty.Comment: Appears in Proceedings of the Twenty-Third Conference on Uncertainty
in Artificial Intelligence (UAI2007
- …